Namespace crawling

The process of extracting opc ua namespaces models from an external server is referred to as crawling the external server.

The crawler works in conjunction with an opc ua client module configured on the hive instance receiving the local copy of the chosen namespaces on the external server. The crawler will use the connection properties of the ua-client, and create namespace items in the module representing the namespaces crawled on the remote server. The namespace items contain the configuration for how each namespace should be treated, and having configuration properties similar to the module properties of a semantics module.

The crawler process consists of several phases:

  • Node discovery. This phase starts at a predefined nodes (root), and follows all references found by browsing the root node. Then the phase continues by browsing all targets found in the previous step until no new nodes are found. This phase will result in knowing all node ids that are reachable from the root node along with their corresponding nodeclass, for all namespaces hosted on the server.
  • Namespace extraction. This phase will extract one or more specific namespaces to individual namespace databases. Since all nodes with nodeclasses are found during the first phase, only attributes for the nodes in the namespaces to be extracted will be read.
  • Namespace deployment. This phase will load the resulting namespace databases into the hive-instance. The semantics runtime in hive will then look for variables that should be subscribed and generate hive items for these values, according to naming rules defined in the namespace service configuration. Finally, the model will be scanned for event sources, and an event-hierarchy will be built for the discovered event sources.

Memory Footprint

For speed, all discovered nodes are cached in order to quickly determine whether a target for a browse-reference is already processed or not. This means that the crawler may use a lot of memory while the remote server is being crawled. It will, however, return that memory as soon as the crawl is completed. It is possible to reduce the memory footprint, but that will necessarily result in the process taking longer to complete. The cache utilizes a high- and low watermark configuration setting, where the high watermark setting defines the maximum number of nodes held by the cache. When the cache reaches the high watermark, it will flush the least referenced (highwatermark - lowwatermark) nodes so the cache only holdts lowwatermark entries. If memory is not an issue, it is also possible to use an in-memory database for holding the complete address-space of the crawled server. Disabling cache-limits and enabling IMDB may speed up the crawl by as much as 50-70%.

Namespace extraction

Namespace databases will by default be deployed to the subfolder "semantics\proxies" in the hive instance's configuration folder . When extracting the namespace databases, some references may be included in more than one namespace database. This occurs because references do not belong to any particular namespace. What the crawler tries to do, is to include all references that has a source or target node in the namespace being extracted. WIth one exception; If the reference type is "HasType", the reference will only be included in the namespace defining the instance (if the type is stored in a different namespace).

Item name generation

Values that should be subscribed on the source server needs hive function items in order to transfer the values to the local ua variable. The crawler will assign such function items to all variable nodes that are of basedatavariabletype, or any of its' subtypes. In order to be able to determine which nodes are basedatavariables, all namespaces defining the types that make up the basedatavariabletypes must be available on the local hive ua server. That means they must either be pre-defined basedatavariabletypes from namespace 0, part of a namespace that is also crawled from the remote server or imported on the local ua server, such as ISA95. If (part of) the type for a variable is not known when the namespace is loaded, the system will not be able to determine that the variable is actually a basedatavariable, and no function item will be generated for that variable.

Datatype support

All primitive scalar datatypes are supported, but only a small subset of struct datatypes are supported.

The following are the supported struct datatypes:

  • Range
  • EUInformation
  • TimeZoneDataType

Variables containing arrays or matrices of a primitive datatype (with the exception of the special ByteArray datatype) cannot be stored in the namespace database. That means properties for such values will not be able to read their values from the database. All other aspects of such variables will be available, but the datavalue cannot be read. However, basedatavariables connected to a hive function item will work for array and matrices as long as the underlying datatype is supported by hive.

Error recovery

If for any reason the crawler looses connectivity with the source server during the first phase of crawling (node-discovery), the crawl will be aborted.

Once the crawler gets passed this stage and the crawler has started to populate the attributes for nodes to be exported, the crawler has some recovery built in. If the connection is lost or the opc ua session for some other reason becomes invalid, the crawler will attempt to reconnect to the source server. It will try 5 times to recreate server connectivity before giving up completely.